Skip to content

Conversation

@aarongreig
Copy link
Contributor

@aarongreig aarongreig commented Nov 29, 2023

This lines up better with the original PI error code and what it was meant to convey.

Also clarify the spec around return codes for urProgramGetFunctionPointer and update the relevant CTS test.

LLVM testing intel/llvm#12263

@aarongreig aarongreig requested review from a team as code owners November 29, 2023 12:19
@aarongreig aarongreig force-pushed the aaron/clarifyProgramGetFunctionPointer branch 2 times, most recently from 210120d to 98624fc Compare November 29, 2023 16:29
@fabiomestre fabiomestre changed the base branch from adapters to main December 5, 2023 16:44
@fabiomestre
Copy link
Contributor

I have updated the target branch of this PR from the adapters branch to the main branch.
Development in UR is moving back to main. The adapters branch will soon be deleted.

@aarongreig aarongreig force-pushed the aaron/clarifyProgramGetFunctionPointer branch 2 times, most recently from 18d229b to 2dbd1e0 Compare December 29, 2023 09:59
@codecov-commenter
Copy link

codecov-commenter commented Dec 29, 2023

Codecov Report

Attention: 4 lines in your changes are missing coverage. Please review.

Comparison is base (fb83446) 15.72% compared to head (0667a9a) 15.73%.

Files Patch % Lines
include/ur_print.hpp 0.00% 2 Missing ⚠️
...onformance/program/urProgramGetFunctionPointer.cpp 0.00% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1136   +/-   ##
=======================================
  Coverage   15.72%   15.73%           
=======================================
  Files         223      223           
  Lines       31475    31476    +1     
  Branches     3557     3557           
=======================================
+ Hits         4951     4954    +3     
+ Misses      26473    26472    -1     
+ Partials       51       50    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@aarongreig
Copy link
Contributor Author

friendly ping @oneapi-src/unified-runtime-maintain @oneapi-src/unified-runtime-opencl-write @oneapi-src/unified-runtime-cuda-write @oneapi-src/unified-runtime-native-cpu-write @oneapi-src/unified-runtime-hip-write @oneapi-src/unified-runtime-level-zero-write @oneapi-src/unified-runtime-command-buffer-write the changes are insignificant despite apparently touching absolutely everything

@aarongreig aarongreig force-pushed the aaron/clarifyProgramGetFunctionPointer branch from 0667a9a to f19816d Compare January 9, 2024 15:55
Copy link
Contributor

@PietroGhg PietroGhg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Native CPU lgtm, thank you

@ldrumm
Copy link
Contributor

ldrumm commented Jan 9, 2024

What happens if the function name can't be found in the program?

@aarongreig
Copy link
Contributor Author

What happens if the function name can't be found in the program?

we use UR_RESULT_ERROR_INVALID_KERNEL_NAME for that

Copy link
Contributor

@nrspruit nrspruit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Level zero LGTM

@aarongreig
Copy link
Contributor Author

ping @oneapi-src/unified-runtime-cuda-write @oneapi-src/unified-runtime-hip-write

if (Ret == CUDA_ERROR_NOT_FOUND) {
*ppFunctionPointer = 0;
Result = UR_RESULT_ERROR_INVALID_FUNCTION_NAME;
Result = UR_RESULT_ERROR_FUNCTION_ADDRESS_NOT_AVAILABLE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks to me like this mapping isn't quite right:

UR_RESULT_ERROR_FUNCTION_ADDRESS_NOT_AVAILABLE
If pFunctionName could be located, but its address couldn't be retrieved.

But see :

To me it seems contradictory for the functionname to be located but it's address not being able to be retrieved when cuda docs state that no function with that name exists?

I think what you have here is fine from a general point of view wrt cuda/hip backends: you have a cuda/hip error that can be returned from multiple driver functions, but it's semantics change depending on which function returned it, so you are mapping it to a UR error that gives more info. I think this is a fine idea (although normally I think it is better to return plugin specific errors with the native error in cases where the native error is more clear than a UR mapping); it just looks like the mapping isn't quite right currently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only just read @ldrumm s comment which is making the point a lot more succinctly: #1136 (comment)

Copy link
Contributor

@JackAKirk JackAKirk Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this further. Maybe it is better to append e.g.:

 + void setPluginSpecificMessage(CUresult cu_res, UR_RESULT UR_res) {
- void setPluginSpecificMessage(CUresult cu_res) {
  const char *error_string;
  const char *error_name;
  cuGetErrorName(cu_res, &error_name);
  cuGetErrorString(cu_res, &error_string);
  char *message = (char *)malloc(strlen(error_string) + strlen(error_name) + 2);
  strcpy(message, error_name);
  strcat(message, "\n");
  strcat(message, error_string);

+ setErrorMessage(message, UR_res);
- setErrorMessage(message, UR_RESULT_ERROR_ADAPTER_SPECIFIC);
  free(message);
}

from

Then this gives the option of passing in a UR_RESULT (or whatever it is called) different from UR_RESULT_ERROR_ADAPTER_SPECIFIC (in such cases that there is a clarifying mapping as I described in my first comment).
Then you can

  • preserve the native error
  • clarify it by providing a UR mapping if one exists
  • If one doesn't exist (which might be the case in this example from your code) then you can pass in UR_RESULT_ERROR_ADAPTER_SPECIFIC, possibly with an accompanying message.

If such a protocol were decided across backends, then documented, this would make it easier to prevent such errors as this example in the future (i.e making mappings from native errors to UR that don't make sense), and provide a consistent error message framework for contributors to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah FUNCTION_ADDRESS_NOT_AVAILABLE is a CL exclusive failure mode, it must have proliferated when it got renamed to be more generic, I've replaced all uses of it outside of the CL adapter with INVALID_KERNEL_NAME

This lines up better with the original PI error code and what it was
meant to convey.

Also clarify the spec around return codes for
urProgramGetFunctionPointer and update the relevant CTS test.
@aarongreig aarongreig force-pushed the aaron/clarifyProgramGetFunctionPointer branch from f19816d to 11d5613 Compare February 7, 2024 10:55
@aarongreig
Copy link
Contributor Author

merged into #1212

@aarongreig aarongreig closed this Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants